20 research outputs found

    Characterizing the Temperature of SAT Formulas

    Get PDF
    The remarkable advances in SAT solving achieved in the last years have allowed to use this technology to solve many real-world applications, such as planning, formal verification and cryptography, among others. Interestingly, these industrial SAT problems are commonly believed to be easier than classical random SAT formulas, but estimating their actual hardness is still a very challenging question, which in some cases even requires to solve them. In this context, realistic pseudo-industrial random SAT generators have emerged with the aim of reproducing the main features of these application problems to better understand the success of those SAT solving techniques on them. In this work, we present a model to estimate the temperature of real-world SAT instances. This temperature represents the degree of distortion into the expected structure of the formula, from highly structured benchmarks (more similar to real-world SAT instances) to the complete absence of structure (observed in the classical random SAT model). Our solution is based on the popularity–similarity random model for SAT, which has been recently presented to reproduce two crucial features of application SAT benchmarks: scale-free and community structures. This model is able to control the hardness of the generated formula by introducing some randomizations in the expected structure. Using our regression model, we observe that the estimated temperature of the applications benchmarks used in the last SAT Competitions correlates to their hardness in most of the cases.Juan de la Cierva program, fellowship IJC2019-040489-I, funded by MCIN and AE

    Improving Skip-Gram based Graph Embeddings via Centrality-Weighted Sampling

    Full text link
    Network embedding techniques inspired by word2vec represent an effective unsupervised relational learning model. Commonly, by means of a Skip-Gram procedure, these techniques learn low dimensional vector representations of the nodes in a graph by sampling node-context examples. Although many ways of sampling the context of a node have been proposed, the effects of the way a node is chosen have not been analyzed in depth. To fill this gap, we have re-implemented the main four word2vec inspired graph embedding techniques under the same framework and analyzed how different sampling distributions affects embeddings performance when tested in node classification problems. We present a set of experiments on different well known real data sets that show how the use of popular centrality distributions in sampling leads to improvements, obtaining speeds of up to 2 times in learning times and increasing accuracy in all cases

    Knowledge discovery in multi-relational graphs

    Get PDF
    Ante el reducido abanico de metodologías para llevar a cabo tareas de aprendizaje automático relacional, el objetivo principal de esta tesis es realizar un análisis de los métodos existentes, modificando u optimizando en la medida de lo posible algunos de ellos, y aportar nuevos métodos que proporcionen nuevas vías para abordar esta difícil tarea. Para ello, y sin nombrar objetivos relacionados con revisiones bibliográficas ni comparativas entre modelos e implementaciones, se plantean una serie de objetivos concretos a ser cubiertos: 1. Definir estructuras flexibles y potentes que permitan modelar fenómenos en base a los elementos que los componen y a las relaciones establecidas entre éstos. Dichas estructuras deben poder expresar de manera natural propiedades complejas (valores continuos o categóricos, vectores, matrices, diccionarios, grafos,...) de los elementos, así como relaciones heterogéneas entre éstos que a su vez puedan poseer el mismo nivel de propiedades complejas. Además, dichas estructuras deben permitir modelar fenómenos en los que las relaciones entre los elementos no siempre se dan de forma binaria (intervienen únicamente dos elementos), sino que puedan intervenir un número cualquiera de ellos. 2. Definir herramientas para construir, manipular y medir dichas estructuras. Por muy potente y flexible que sea una estructura, será de poca utilidad si no se poseen las herramientas adecuadas para manipularla y estudiarla. Estas herramientas deben ser eficientes en su implementación y cubrir labores de construcción y consulta. 3. Desarrollar nuevos algoritmos de aprendizaje automático relacional de caja negra. En aquellas tareas en las que nuestro objetivo no es obtener modelos explicativos, podremos permitirnos utilizar modelos de caja negra, sacrificando la interpretabilidad a favor de una mayor eficiencia computacional. 4. Desarrollar nuevos algoritmos de aprendizaje automático relacional de caja blanca. Cuando estamos interesados en una explicación acerca del funcionamiento de los sistemas que se analizan, buscaremos modelos de aprendizaje automático de caja blanca. 5. Mejorar las herramientas de consulta, análisis y reparación para bases de datos. Algunas de las consultas a larga distancia en bases de datos presentan un coste computacional demasiado alto, lo que impide realizar análisis adecuados en algunos sistemas de información. Además, las bases de datos en grafo carecen de métodos que permitan normalizar o reparar los datos de manera automática o bajo la supervisión de un humano. Es interesante aproximarse al desarrollo de herramientas que lleven a cabo este tipo de tareas aumentando la eficiencia y ofreciendo una nueva capa de consulta y normalización que permita curar los datos para un almacenamiento y una recuperación más óptimos. Todos los objetivos marcados son desarrollados sobre una base formal sólida, basada en Teoría de la Información, Teoría del Aprendizaje, Teoría de Redes Neuronales Artificiales y Teoría de Grafos. Esta base permite que los resultados obtenidos sean suficientemente formales como para que los aportes que se realicen puedan ser fácilmente evaluados. Además, los modelos abstractos desarrollados son fácilmente implementables sobre máquinas reales para poder verificar experimentalmente su funcionamiento y poder ofrecer a la comunidad científica soluciones útiles en un corto espacio de tiempo

    Detecting the ultra low dimensionality of real networks

    Get PDF
    Reducing dimension redundancy to find simplifying patterns in high dimensional datasets and complex networks has become a major endeavor in many scientific fields. However, detecting the dimensionality of their latent space is challenging but necessary to generate efficient embeddings to be used in a multitude of downstream tasks. Here, we propose a method to infer the dimensionality of networks without the need for any a priori spatial embed ding. Due to the ability of hyperbolic geometry to capture the complex con nectivity of real networks, we detect ultra low dimensionality far below values reported using other approaches. We applied our method to real networks from different domains and found unexpected regularities, including: tissue specific biomolecular networks being extremely low dimensional; brain con nectomes being close to the three dimensions of their anatomical embedding; and social networks and the Internet requiring slightly higher dimensionality. Beyond paving the way towards an ultra efficient dimensional reduction, our findings help address fundamental issues that hinge on dimensionality, such as universality in critical behavior.Agencia Estatal de Investigación PID2019-106290GB-C22/AEI/10.13039/501100011033Generalitat de Catalunya 2017SGR106

    Generador de Grafos Multi-relacionales a partir de redes sociales

    Get PDF
    The tool introduced in this paper, CorpuRed, allows obtaining a dataset from online social networks that can be used for research projects that require information about social behaviour on Internet. The way to obtain such data is slightly platform dependent (the Facebook case is described) and they are stored in a graph database that will be accessible through an academic license API.La herramienta presentada en este artículo, CorpuRed, permite obtener datos de plataformas sociales en línea para ser utilizados en proyectos de investigación que requieran de información sobre el comportamiento social en Internet. La forma de obtener dichos datos depende ligeramente de cada plataforma (se muestra el caso particular de Facebook), y posteriormente son almacenados en una base de datos en grafo que será accesible a través de una API bajo una licencia académica.

    On the Temperature of SAT Formulas

    Get PDF
    The remarkable advances in SAT solving achieved in the last years have allowed to use this technology in many real-world applications of Artificial Intelligence, such as planning, formal verification, and scheduling, among others. Interestingly, these industrial SAT problems are commonly believed to be easier than classical random SAT formulas, but estimating their actual hardness is still a very challenging question, which in some cases even requires to solve them. In this context, realistic pseudo-industrial random SAT generators have emerged with the aim of reproducing the main features shared by the majority of these application problems. The study of these models may help to better understand the success of those SAT solving techniques and possibly improve them. In this work, we present a model to estimate the temperature of real-world SAT instances. This temperature represents the degree of distortion into the expected structure of the formula, from highly structured benchmarks (more similar to real-world SAT instances) to the complete absence of structure (observed in the classical random SAT model). Our solution is based on the Popularity-Similarity (PS) random model for SAT, which has been recently presented to reproduce two crucial features of application SAT benchmarks: scale-free and community structures. The PS model is able to control the hardness of the generated formula by introducing some randomizations in the expected structure. Our solution is a first step towards a hardness oracle based on the temperature of SAT formulas, which may be able to estimate the cost of solving real-world SAT instances without solving them

    Generador de Grafos Multi-relacionales a partir de redes sociales

    Get PDF
    La herramienta presentada en este artículo, CorpuRed, permite obtener datos de plataformas sociales en línea para ser utilizados en proyectos de investigación que requieran de información sobre el comportamiento social en Internet. La forma de obtener dichos datos depende ligeramente de cada plataforma (se muestra el caso particular de Facebook), y posteriormente son almacenados en una base de datos en grafo que será accesible a través de una API bajo una licencia académica.

    A Multi-Relational Graph Generator Based-on Social Networks Data

    No full text
    The tool introduced in this paper, CorpuRed, allows obtaining a dataset from online social networks that can be used for research projects that require information about social behaviour on Internet. The way to obtain such data is slightly platform dependent (the Facebook case is described) and they are stored in a graph database that will be accessible through an academic license API
    corecore